Search CORE

16 research outputs found

Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts

Author: Daelemans Walter
Tulkens Stéphan
Šuster Simon
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we report a knowledge-based method for Word Sense Disambiguation in the domains of biomedical and clinical text. We combine word representations created on large corpora with a small number of definitions from the UMLS to create concept representations, which we then compare to representations of the context of ambiguous terms. Using no relational information, we obtain comparable performance to previous approaches on the MSH-WSD dataset, which is a well-known dataset in the biomedical domain. Additionally, our method is fast and easy to set up and extend to other domains. Supplementary materials, including source code, can be found at https: //github.com/clips/yarnComment: 6 pages, 1 figure, presented at the 15th Workshop on Biomedical Natural Language Processing, Berlin 201

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings

Author: Daelemans Walter
Fivez Pieter
Šuster Simon
Publication venue
Publication date: 01/01/2017
Field of study

We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. To tune the parameters of this model, we generate self-induced spelling error corpora. We perform our experiments for two languages. For English, we greatly outperform off-the-shelf spelling correction tools on a manually annotated MIMIC-III test set, and counter the frequency bias of a noisy channel model, showing that neural embeddings can be successfully exploited to improve upon the state-of-the-art. For Dutch, we also outperform an off-the-shelf spelling correction tool on manually annotated clinical records from the Antwerp University Hospital, but can offer no empirical evidence that our method counters the frequency bias of a noisy channel model in this case as well. However, both our context-sensitive model and our implementation of the noisy channel model obtain high scores on the test set, establishing a state-of-the-art for Dutch clinical spelling correction with the noisy channel model.Comment: Appears in volume 7 of the CLIN Journal, http://www.clinjournal.org/biblio/volum

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

A Short Review of Ethical Challenges in Clinical Natural Language Processing

Author: Daelemans Walter
Tulkens Stéphan
Šuster Simon
Publication venue
Publication date: 01/01/2017
Field of study

Clinical NLP has an immense potential in contributing to how clinical practice will be revolutionized by the advent of large scale processing of clinical records. However, this potential has remained largely untapped due to slow progress primarily caused by strict data access policies for researchers. In this paper, we discuss the concern for privacy and the measures it entails. We also suggest sources of less sensitive data. Finally, we draw attention to biases that can compromise the validity of empirical research and lead to socially harmful applications.Comment: First Workshop on Ethics in Natural Language Processing (EACL'17

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Unsupervised patient representations from clinical notes with interpretable classification decisions

Author: Daelemans Walter
Luyckx Kim
Sushil Madhumita
Šuster Simon
Publication venue
Publication date: 01/01/2017
Field of study

We have two main contributions in this work: 1. We explore the usage of a stacked denoising autoencoder, and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes. We evaluate these representations by using them as features in multiple supervised setups, and compare their performance with those of sparse representations. 2. To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model. Further, we calculate the significance of the input features of the trained classifiers when we use these pretrained representations as input.Comment: Accepted poster at NIPS 2017 Workshop on Machine Learning for Health (https://ml4health.github.io/2017/

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Učenje umetnih nevronskih omrežij z izboljšanimi metodami vzvratnega širjenja pogreška : diplomska naloga iniverzitetnega študijskega programa

Author: Šuster Simon
Publication venue: S. Šuster
Publication date: 26/07/2007
Field of study

Digital library of University of Maribor